Document classification について

Words near each other

・ Docufiction
・ DocuLex
・ Document
・ Document (album)
・ Document (disambiguation)
・ Document (TV series)
・ Document 1
・ Document 12-571-3570
・ Document 5 (album)
・ Document 7 (album)
・ Document 8
・ Document and Eyewitness
・ Document automation
・ Document camera
・ Document capture software
・ Document classification
・ Document clustering
・ Document collaboration
・ Document comparison
・ Document composition
・ Document Content Architecture
・ Document conversion
・ Document Definition Markup Language
・ Document dump
・ Document engineering
・ Document examiner
・ Document Exploitation (DOCEX)
・ Document file format
・ Document Freedom Day
・ Document imaging

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Document classification ：ウィキペディア英語版

Document classification
Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science. The problems are overlapping, however, and there is therefore interdisciplinary research on document classification.
The documents to be classified may be texts, images, music, etc. Each kind of document possesses its special classification problems. When not otherwise specified, text classification is implied.
Documents may be classified according to their subjects or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: the content-based approach and the request-based approach.
=="Content-based" versus "request-based" classification==
Content-based classification is classification in which the weight given to particular subjects in a document determines the class to which the document is assigned. It is, for example, a rule in much library classification that at least 20% of the content of a book should be about the class to which the book is assigned.〔Library of Congress (2008). The subject headings manual. Washington, DC.: Library of Congress, Policy and Standards Division. (Sheet H 180: "Assign headings only for topics that comprise at least 20% of the work.")〕 In automatic classification it could be the number of times given words appears in a document.
Request-oriented classification (or -indexing) is classification in which the anticipated request from users is influencing how documents are being classified. The classifier asks himself: “Under which descriptors should this entity be found?” and “think of all the possible queries and decide for which ones the entity at hand is relevant” (Soergel, 1985, p. 230〔Soergel, Dagobert (1985). Organizing information: Principles of data base and retrieval systems. Orlando, FL: Academic Press.〕).
Request-oriented classification may be classification that is targeted towards a particular audience or user group. For example, a library or a database for feminist studies may classify/index documents differently when compared to a historical library. It is probably better, however, to understand request-oriented classification as ''policy-based classification'': The classification is done according to some ideals and reflects the purpose of the library or database doing the classification. In this way it is not necessarily a kind of classification or indexing based on user studies. Only if empirical data about use or users are applied should request-oriented classification be regarded as a user-based approach.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Document classification」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース